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Last Lecture: 

Introduction to the Coalescent 


Coalescent approach 

Proceed backwards through time. 
Genealogy of a sample of sequences. 

Infinite sites model 

All mutations distinguishable. 

No reverse mutation. 







Some key ideas ... 


Probability of coalescence events 
Length of genealogy and its branches 
Expected number of mutations 


Parameter 6 which combines population 
size and mutation rate 







Building Blocks... 


Probability of sampling distinct ancestors 
for n sequences 
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Coalescence time t is approximately 
exponentially distributed 












Some Key Results... 


Coalescence Time (population size units) 
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Some More Key Results ... 


Expected Number of Polymorphisms 

For a diploid sample 
E(S) = 4NjU^ \/i = 1 / i 

i =1 i =1 

For an haploid sample 
E{S) = 2N/uY j \li = dY\/i 








Inferences about 6 


Could be estimated from S 

Divide by expected length of genealogy 
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Could then be used to: 

Estimate N, if mutation rate p is known 
Estimate p, if population size N is known 









Alternative Estimator for 0 ... 


Count pairwise differences between 
sequences 


Compute average number of differences 
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A 

Var(6) as a function of N 
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Parameters 

N = 10,000 individuals 

M = 10- 4 


0 = 4 


Sample Size 


















































































Today ... 


More applications of the coalescent 
Predicting allele frequency distributions 

Using simulations 

The full distribution of S 

Using analytical calculations 







A Coalescent Simulation ... 


Let’s consider tracing the ancestry of 4 
sequences 














When n = 4 


Probability of Coalescent Event 
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Time to Next Coalescent Event 
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Sample time from exponential distribution 
Pick two sequences at random to coalesce 



















Next n = 3 


Let’s assume that sequences 3 and 4 are selected ... 
Then, we repeat the process for a sample of 3 sequences 
















Next n = 2 


Let’s assume that sequences 1 and 2 are selected to coalesce 
Then, we repeat the process for a sample of 2 sequences 

















The Simulated Coalescent 


At this point, we could 
place mutations in 
genealogy. Most often, 
these would fall in longer 
branches. 



















A Coalescent Simulation 


Mutations in these 
branches affect 
a pair of sequences 
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A Coalescent Simulation ... 


Mutations in these 
branches affect 
a single sequence 


T(2) 





















Frequency Spectrum 


Repeating the simulation multiple times, would 
give us a predicted mutation spectrum. 
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Frequency Spectrum (n = 10) 
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Frequency Spectrum (n = 100) 
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Frequency (out of n) 











Frequency Spectrum 


Constant size population 
Exponentially growing population 


Most variants are rare 

For n = 100, -44% of variants occur < 5/100. 
For n = 10, -35% of variants observed once. 








Mutation Spectrum 


Depends on genealogy 

Population Size 
Population Growth 
Population Subdivision 

Does not depend on 

Mutation rate! 







Deviations from Neutral Spectrum 


When would you expect deviations from 
the spectra we described? 

What would you expect for... 

A rapidly growing population? 

A population whose size is decreasing? 


Why? 







Effect of Polymorphism Type 




































































































Number of Mutations 


Can be derived from coalescent tree 

What are the key features? 

Analytical results possible 

Trace back in time until MRCA, tracking 
mutation events 







Sample of Two Sequences 


Track coalescences and mutations 

Probability of a coalescent event? 

• Depends on population size ... 

Probability of a mutation? 

• Depends on mutation rate ... 

Proceed backwards until either occurs... 

Conditional probability for each outcome? 







Two Identical Sequences 
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Full distribution of S... 


Probability that first j events are mutations... 
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Example... 


2 sequences 

Population size N = 25,000 
Mutation rate ju = 10' 5 


Probability of 0, 1,2, 3... mutations 








And for multiple sequences... 

Describe number of mutations until the 
next coalescence event 

Proceed back in time, until: 

One of n sequences mutates... 

A coalescent event occurs... 

• Then track mutations in (n-1) sequences 








Formulae 
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Example... 


3 sequences 

Population size N = 25,000 
Mutation rate ju = 10' 5 


Probability of 0, 1,2, 3... mutations 







Number of Mutations 



Number of Mutations 

































So far 


One homogeneous population 

Coalescence times 
Number of mutations 

• Expectation 

• Distribution 

Spectrum of mutations 

Several assumptions, including ... 

No recombination 








Recombination ... 


No recombination 

Single genealogy 

Free recombination 

Two independent genealogies 
Same population history 

Intermediate case 

Correlated genealogies 
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